-
Notifications
You must be signed in to change notification settings - Fork 6
Support for Azure workload identity in AKS and Arc clusters #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Azure workload identity in AKS and Arc clusters #141
Conversation
|
This is awesome, @agreaves-ms! Thank you so much for looking into this. The team will be going through this change and thinking through what else we might need to change. Some high level thoughts is we probably need to give you more "types" of Credentials (e.g. WorkloadIdentityCredential, DataAccessCredential, ConnectionStringCredential, etc). Another thing we will look at as a team is changes necessary to our service and |
This is a great idea, I wanted to try and make a PR that was small and focused without refactoring to much of your codebase. Mainly to get the ideas through. Having a new credential type would be the most ideal, I'm needing to also alter the client_config.py to support DataSet upload and also DataSet download from both the osmo cli (if I'm a user running locally with I'll update this PR with that change as well as a separate commit and then point you at it. I was trying to think through a more agnostic naming scheme for this since Workload Identity is really an Azure and GCP term whereas AWS uses something like IRSA. I was thinking of going with Also, feel free to not take this PR, and instead implement the logic however way makes the most sense for your teams architecture that you've planned out. Thank you again for looking into this! |
|
@fernandol-nvidia I've pushed an extra commit to further address workload identity support for input and output datasets, both on upload from the osmo-cli and then running in the workflow. Please take a look at that as well when considering implementation. I've now fully tested this in my own environment and I'm able to use my authentication with Feel free to use this work however your team needs. I'm happy to further test any implementation that you all make available as well. |
|
Hey @agreaves-ms, thanks for the update and apologies for the delay. I spent the past couple of days doing a refactor of how OSMO data operations work w.r.t to different credential strategies. I have a pending PR at #159. The tl;dr is that we are differentiating I believe this would give you the necessary abstractions/interface to add workload identity support for Azure. More specifically: Defining the behavior at OSMO/src/lib/data/storage/backends/azure.py Lines 274 to 287 in 682dbf3
And overriding this method on how to decide if workload identity should be available for a particular backend: OSMO/src/lib/data/storage/backends/common.py Lines 279 to 297 in 682dbf3
Hopefully, this frees you from having to update so many unrelated files just to get Azure workload identity working :) Please feel free to add any feedback/suggestions that would be helpful for you. I will keep you updated as things progress and can help you rebase this PR to absorb the new changes. |
f2d996c to
6bd2877
Compare
|
@fernandol-nvidia I've update this PR with the changes that have since been added. I needed to make some additional changes to parts that would allow for DataCredential instead of StaticDataCredential (only) as well. Please review, I built this and verified all the azure workload identity, RBAC, and local auth works with these changes. Feel free to take over this PR as well as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy 2026 :)
Nice, this looks great! I've added some minor comments which will help slim down the changes necessary.
Given that you've already tested this, I feel pretty confident with the PR. Once comments are addressed, I will approve!
- implement service account creation with customizable name and annotations - enhance service templates to support extra pod labels for various services - update Azure backend to utilize DefaultAzureCredential for authentication - add tests for Azure credential extraction and client creation
…Storage - add function to extract AccountKey from connection string - update AzureBlobStorageClient to handle different credential types
…TaskGroup - change StaticDataCredential to DataCredential in get_all_data_creds method - update fetch_creds function signature to use DataCredential
…d account URL - remove deprecated storage account extraction function - modify create_client to accept storage_account and account_url parameters - update AzureBlobStorageClientFactory to use new parameters - adjust tests to reflect changes in client creation 🔒 - Generated by Copilot
30edbb7 to
d66d6a8
Compare
…ent function 🔧 - Generated by Copilot
…eation 🔧 - Generated by Copilot
|
@fernandol-nvidia I finally had some time to be able to make these changes and redeploy to test them. Please review and let me know if any additional changes are needed. |
fernandol-nvidia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Thank you so much for making the changes.
* allow flexible squid proxy replicas (#241) * allow flexible squid proxy replicas * fix * Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167) * Improving Performance for Uploading Workflow Artifacts in Worker Jobs * Cleanup * Add progress writing after upload * Add dependency in Bazel BUILD * Add type to mypy requirements * Update mypy requirements * Add to mypy_cli BUILD * Fix lint * Comment * Use constant to define semaphor and storage client executor count * #244 - Use last login url if url is not specified (#245) * Use last login url if url is not specified * print message * Cannot select any text inside modals or slideouts (#248) * Video html element not changin when selecting different video files in the UI for OSMO dataset (#249) * sync-feature-branches: fix no conflict case, allow single branch to be synced (#252) * Fix sync-feature-branches with no merge conflicts * Allow a single branch to be specified for sync-feature-branches * Perform operations as OSMO CI Bot * Add external label when the PR is created * extract issue number * add test cases (#247) * Allow PR checks to run on release branches (#264) * Database Pooling in Postgres Singleton Across Services (#251) * Initial commit for database pooling * Update set_session * Fix lint * Update PostgresConnector to have semaphor to control connections * Lint fix * Fix number of maxconn for test * Address comments * Add Go Postgres utils (#272) * #148 - Auth Project Design Documents (#165) * add args to postgres (#282) * #267 - cloud deployment scripts (#268) * script to create azure resources and deploy * Remove auto-generated values files from tracking - Added .gitignore to ignore values/, *.env files - Removed values/*.yaml files from git (auto-generated during deployment) * add aws script * add aws script * add copyright * update copyright * Support for Azure workload identity in AKS and Arc clusters (#141) * feat(src): add Azure service account and extra pod labels configuration - implement service account creation with customizable name and annotations - enhance service templates to support extra pod labels for various services - update Azure backend to utilize DefaultAzureCredential for authentication - add tests for Azure credential extraction and client creation * feat(src): extract account key from connection string for Azure Blob Storage - add function to extract AccountKey from connection string - update AzureBlobStorageClient to handle different credential types * feat(test): add tests for account key extraction from Azure connection strings * chore: clean up linting issues for tests * refactor(src): update data credential types in PostgresConnector and TaskGroup - change StaticDataCredential to DataCredential in get_all_data_creds method - update fetch_creds function signature to use DataCredential * feat(src): update Azure client creation to include storage account and account URL - remove deprecated storage account extraction function - modify create_client to accept storage_account and account_url parameters - update AzureBlobStorageClientFactory to use new parameters - adjust tests to reflect changes in client creation 🔒 - Generated by Copilot * refactor(src): mark storage_account parameter as unused in create_client function 🔧 - Generated by Copilot * refactor(src): remove unused storage_account parameter from client creation 🔧 - Generated by Copilot * Fix conflicts --------- Co-authored-by: Vivian Pan <[email protected]> Co-authored-by: ethany-nv <[email protected]> Co-authored-by: RyaliNvidia <[email protected]> Co-authored-by: patclarknvidia <[email protected]> Co-authored-by: Ethan Look-Potts <[email protected]> Co-authored-by: xutongNV <[email protected]> Co-authored-by: Allen Greaves <[email protected]>
* allow flexible squid proxy replicas (#241) * allow flexible squid proxy replicas * fix * Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167) * Improving Performance for Uploading Workflow Artifacts in Worker Jobs * Cleanup * Add progress writing after upload * Add dependency in Bazel BUILD * Add type to mypy requirements * Update mypy requirements * Add to mypy_cli BUILD * Fix lint * Comment * Use constant to define semaphor and storage client executor count * #244 - Use last login url if url is not specified (#245) * Use last login url if url is not specified * print message * Cannot select any text inside modals or slideouts (#248) * Video html element not changin when selecting different video files in the UI for OSMO dataset (#249) * sync-feature-branches: fix no conflict case, allow single branch to be synced (#252) * Fix sync-feature-branches with no merge conflicts * Allow a single branch to be specified for sync-feature-branches * Perform operations as OSMO CI Bot * Add external label when the PR is created * extract issue number * add test cases (#247) * Allow PR checks to run on release branches (#264) * Database Pooling in Postgres Singleton Across Services (#251) * Initial commit for database pooling * Update set_session * Fix lint * Update PostgresConnector to have semaphor to control connections * Lint fix * Fix number of maxconn for test * Address comments * Add Go Postgres utils (#272) * #148 - Auth Project Design Documents (#165) * add args to postgres (#282) * #267 - cloud deployment scripts (#268) * script to create azure resources and deploy * Remove auto-generated values files from tracking - Added .gitignore to ignore values/, *.env files - Removed values/*.yaml files from git (auto-generated during deployment) * add aws script * add aws script * add copyright * update copyright * Support for Azure workload identity in AKS and Arc clusters (#141) * feat(src): add Azure service account and extra pod labels configuration - implement service account creation with customizable name and annotations - enhance service templates to support extra pod labels for various services - update Azure backend to utilize DefaultAzureCredential for authentication - add tests for Azure credential extraction and client creation * feat(src): extract account key from connection string for Azure Blob Storage - add function to extract AccountKey from connection string - update AzureBlobStorageClient to handle different credential types * feat(test): add tests for account key extraction from Azure connection strings * chore: clean up linting issues for tests * refactor(src): update data credential types in PostgresConnector and TaskGroup - change StaticDataCredential to DataCredential in get_all_data_creds method - update fetch_creds function signature to use DataCredential * feat(src): update Azure client creation to include storage account and account URL - remove deprecated storage account extraction function - modify create_client to accept storage_account and account_url parameters - update AzureBlobStorageClientFactory to use new parameters - adjust tests to reflect changes in client creation 🔒 - Generated by Copilot * refactor(src): mark storage_account parameter as unused in create_client function 🔧 - Generated by Copilot * refactor(src): remove unused storage_account parameter from client creation 🔧 - Generated by Copilot * Add new project proposal to describe nvlink + topology aware scheduling (#211) * Add new project proposal to describe nvlink + topology aware scheduling * Split design into two docs * Finish docs and add some updates from feedback * Add some open items * OSMO-6044: Application error when closing Task Details after switching Events view from Task to Workflow (#315) * add redis utlis, update postgres utils (#313) * add redis utlis, update postgres utils * add deps * Fix missing seperator in the test runner roles (#320) * fix * remove * fix --------- Co-authored-by: Vivian Pan <[email protected]> Co-authored-by: ethany-nv <[email protected]> Co-authored-by: RyaliNvidia <[email protected]> Co-authored-by: patclarknvidia <[email protected]> Co-authored-by: Ethan Look-Potts <[email protected]> Co-authored-by: xutongNV <[email protected]> Co-authored-by: Allen Greaves <[email protected]> Co-authored-by: ecolternv <[email protected]> Co-authored-by: tdewanNvidia <[email protected]>
Description
Adds Azure AKS Workload Identity support, allowing pods to authenticate to Azure Blob Storage without connection strings or managed secrets. Uses
DefaultAzureCredentialfor token-based auth and updates Helm charts to support workload identity configuration.Related to Issue #153
Changes
Helm charts
serviceAccount.create- toggle chart-managed ServiceAccount creationserviceAccount.name- specify pre-provisioned SA nameserviceAccount.annotations- add workload identity annotations (e.g.,azure.workload.identity/client-id)extraPodLabelsto services for workload identity labels (azure.workload.identity/use: "true")imagePullSecretsoptional when using workload identityAzure storage backend
AzureBlobStorageClientsupports both connection string and token-based authDefaultAzureCredentialsupport for workload identityaccess_key_id,access_key) are now optionalCredentials
BasicDataCredentialfields are now optionalget_access_key_value()for safe credential retrievalPyInstaller
azure.identity,msal, andmsal_extensionsto Azure hooks for CLI workload identity supportTests
Usage
Checklist